Automated data modeling for abbreviations utilizing fuzzy reasoning logic

ABSTRACT

A method includes analyzing an enterprise data warehouse to determine name attribute scores based on occurrences of enterprise terms and abbreviations in the enterprise data warehouse. The method also includes generating a scoring summary of phrases, applying fuzzy reasoning logic to identify one or more relationship patterns and weights for the phrases including at least one shared word to produce training data for a data model associated with the enterprise data warehouse, and updating the data model with a new abbreviated field name associated with a new field name based on identifying a closest match of the new field name with the training data.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a continuation of U.S. application Ser. No. 16/896,715, filedJun. 9, 2020, the disclosure of which is incorporated herein byreference in its entirety.

BACKGROUND

Enterprise data warehouses can store a vast amount of data generatedfrom multiple applications, files, and other sources. Variousapplications and associated databases can have locally defined rules andother constraints about how underlying data is managed and presented.Data modelers can manually develop data models by analyzing documents tounderstand data attributes and design coherent tables. As data modelsare developed, data modelers strive to adhere to enterprise modelingstandards and maintain consistency across each application. As multipledata sources are modeled from across an enterprise, the process ofmerging data into an enterprise data warehouse can be time consuming, asmapping operations are manually determined to account for differences informatting, field names, data types, naming conventions, and the like.Merging data produced by multiple applications and associated withdifferent organizations typically requires careful coordination, whichcan result in extended development time, modification time, and resourceutilization.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter which is regarded as the invention is particularlypointed out and distinctly claimed in the claims at the conclusion ofthe specification. The features and advantages of the invention areapparent from the following detailed description taken in conjunctionwith the accompanying drawings in which:

FIG. 1 depicts a block diagram of a system according to some embodimentsof the present invention;

FIG. 2 depicts a block diagram of a system according to some embodimentsof the present invention;

FIG. 3 depicts a user interface for training and data modeling accordingto some embodiments of the present invention;

FIG. 4 depicts a user interface for training and data modeling in afirst state according to some embodiments of the present invention;

FIG. 5 depicts a user interface for training and data modeling in asecond state according to some embodiments of the present invention;

FIG. 6 depicts a scoring summary according to some embodiments of thepresent invention;

FIG. 7 depicts a block diagram of a scoring process according to someembodiments of the present invention;

FIG. 8 depicts a block diagram of a training summary according to someembodiments of the present invention;

FIG. 9 depicts a process flow of updating a data model according to someembodiments of the present invention;

FIG. 10 depicts a user interface for generation of a data model in afirst state according to some embodiments of the present invention;

FIG. 11 depicts a user interface for generation of a data model in asecond state according to some embodiments of the present invention;

FIG. 12 depicts a user interface for reviewing data model outputaccording to some embodiments of the present invention;

FIG. 13 depicts a user interface of an updated abbreviation listaccording to some embodiments of the present invention;

FIG. 14 depicts a user interface for generation of a data model in athird state according to some embodiments of the present invention;

FIG. 15 depicts a user interface for creating a data definition languageoutput based on a selected source according to some embodiments of thepresent invention;

FIG. 16 depicts a user interface with a data definition language outputbased on a selected source according to some embodiments of the presentinvention; and

FIG. 17 depicts a process flow according to some embodiments of thepresent invention.

DETAILED DESCRIPTION

According to an embodiment, a system for automated data modeling isprovided. The system can improve computer system performance byautomating data model generation and updates. Use of machine learningand fuzzy logic can enable automated data model updates that comply withexisting rules and constraints that may be inherently embodied withinthe system but not formally defined or well understood due to complexrelationships between many attributes, data structures, and subsystems.The system can scan through databases and identify underlying changes tobe modeled. The system can conduct automatic data type conversionsacross different database technologies and create a data model todeploy. Training of the machine learning can be customized byorganization or other subgroupings with multiple sets of metadatatailored to the organization or other subgroupings. Existing enterpriseabbreviation lists can be read, and scores can be determined for eachword or phrase based on the usage frequency in each organization orother subgroupings. Based on the scoring, the system can combinedifferent words and create acronyms when a physical name reaches acharacter count limit. The system can also suggest new abbreviationswhen none exist and apply the new abbreviations to multiple derivativesof the same word. Further, the system can provide feedback to varioususers who interface with the system.

A typical data modeling process may include business analysts whoinspect various data sources, files, and datasets (e.g., mainframedatasets) to produce a mapping document that defines how various typesof data are stored and accessed in systems monitored by the analysts. Adata modeler may review the mapping document for changes and reviewenterprise abbreviations to determine a data definition language formatto create or modify one or more tables in an enterprise data warehouse,while avoiding conflicting definitions. An enterprise data warehouse mayhave hundreds of tables that require frequent maintenance, for instance,as various attributes may be added, modified, or removed. A data modelercan identify, analyze, and map changes to create updates to physical andlogical data models. Physical data models can refer to data column namesin underlying tables that may have naming conventions based on thetechnology used to implement the data models, while logical data modelsmay be organized according to naming conventions and preferences of anorganization. The process of data model creation and data management canlead to excessive resource utilization, for instance, where mappings areinefficiently assigned to physical resources, where excessive data sizeis allocated, where unneeded data are not removed, where lapses innaming conventions result in duplicate or unneeded definitions, and/orother such inefficiencies. Embodiments can, for example, increase datamanagement efficiency, reduce processing time needed to completeupdates, and identify inconsistencies or potential constraint violationsbefore updates are finalized.

Turning now to FIG. 1 , a system 100 is depicted upon which automateddata modeling may be implemented. The system 100 includes a plurality ofsubsystems, such as one or more data modeling servers 102, one or moredata source servers 104, and a data warehouse system 106 coupled to anetwork 110. A plurality of user systems 112 can access content and/orinterfaces through the network 110. For example, user system 112A may beconfigured as a modeler system operable to interface with the datamodeling servers 102, while user system 112N may be configured as ananalyst system operable to interface directly with the data sourceservers 104 and access other elements of the system 100 indirectly, suchas the data warehouse system 106.

The data source servers 104 can execute applications 114 that interfacewith various data sources, such as databases 116 and files 118. Thedatabases 116 can be configured in various formats depending on theunderlying technologies used to manage the databases 116. Further,applications 114 may have different interfaces and use differentapproaches to access and update data stored in the databases 116. One ormore of the files 118 may be accessed, modified, or generated by theapplications 114. For example, files 118 can include various documents,spreadsheets, images, and other such data used by the applications 114and may be linked or otherwise associated with data stored in thedatabases 116.

The data warehouse system 106 can be part of an enterprise datawarehouse 120 that may include multiple databases 122 configured tostore tables 124 of data. The enterprise data warehouse 120 can alsostore metadata 125 defining one or more aspects of data within thedatabases 122, such as database schemas. Data stored in databases 122can be in various formats that can differ from the format and content ofdata stored in databases 116. The databases 122 can collectivelycomprise an archival store that merges data from the databases 116 andperforms format adjustments as needed. In order to support queries andretrieval of data from the enterprise data warehouse 120, a mappingdocument 126 can define how data from the applications 114, databases116, and/or files 118 should be stored in a structured form. Examples offormatting can include field names, data types, field sizes, and othersuch information defined in the mapping document 126. To avoid conflictsin naming and maintain consistency, while also reducing field sizerequirements, an enterprise abbreviation list 128 can be defined thatindicates how known words or phrases should be abbreviated. As the sizeof the enterprise abbreviation list 128 grows, it can become difficultto form new abbreviations consistently by users, while also avoidingconflicting terms that may result in ambiguity or errant mapping.

Rather than relying on users to manually generate or modifyabbreviations associated with the enterprise abbreviation list 128 andmanually build a data model 130 that defines the structure of tables 124and/or other elements of the enterprise data warehouse 120 which areconsistent with the mapping document 126, embodiments can use machinelearning through a training tool 132 and a data modeling tool 134. Thetraining tool 132 and data modeling tool 134 can execute on one or moreof the data modeling servers 102. Training data 136 can be populatedusing supervised or unsupervised learning. As one example, the trainingtool 132 can perform a scoring process to generate a scoring summary 138to produce the training data 136 according to processes furtherdescribed herein. The training data 136 can be used to develop or refinethe data model 130. For instance, the data modeling tool 134 can updatethe data model 130 with one or more new field names using a newabbreviation determined in part through applying the training data 136.The training tool 132 can learn and apply rules 137 regardingformatting, abbreviations, phrase formation, and other such constraints.Fuzzy reasoning logic can be applied to identify one or morerelationship patterns and weights for the phrases including at least oneshared word based on the scoring summary 138 to produce a plurality oftraining data 136 for the data model 130 associated with the enterprisedata warehouse 120. The data model 130 can formally describe or updatethe metadata 125 or other aspects of the enterprise data warehouse 120,such as renaming existing fields or adding new fields consistent withdesired mappings captured in the mapping document 126.

In the example of FIG. 1 , each of the data modeling servers 102, datasource servers 104, data warehouse system 106, and user systems 112 caninclude a processor (e.g., a processing device, such as one or moremicroprocessors, one or more microcontrollers, one or more digitalsignal processors) that receives instructions (e.g., from memory or likedevice), executes those instructions, and performs one or more processesdefined by those instructions. Instructions may be embodied, forexample, in one or more computer programs and/or one or more scripts. Inone example, the system 100 executes computer instructions forimplementing the exemplary processes described herein. Instructions thatimplement various process steps can be executed by different elements ofthe system 100, such as elements of the data modeling servers 102, datasource servers 104, data warehouse system 106, and/or user systems 112.Although depicted separately, one or more of the data modeling servers102, data source servers 104, data warehouse system 106, and/or usersystems 112 can be combined or further subdivided.

The user systems 112 may each be implemented using a computer executingone or more computer programs for carrying out processes describedherein. In one embodiment, the user systems 112 may each comprise apersonal computer (e.g., a laptop, desktop, etc.), a networkserver-attached terminal (e.g., a thin client operating within anetwork), or a portable device (e.g., a tablet computer, personaldigital assistant, smart phone, etc.). In an embodiment, the usersystems 112 are operated by users having the role of analysts oforganizations that interact with the data source servers 104 and/or dataarchitects that use the data modeling servers 102 to support data formattranslation and long-term storage of data from the data source servers104 in the enterprise data warehouse 120.

Each of the data modeling servers 102, data source servers 104, datawarehouse system 106, and user systems 112 can include a local datastorage device, such as a memory device. A memory device, also referredto herein as “computer-readable memory” (e.g., non-transitory memorydevices as opposed to transmission devices or media), may generallystore program instructions, code, and/or modules that, when executed bya processing device, cause a particular machine to function inaccordance with one or more embodiments described herein.

The network 110 can include any type of computer communicationtechnology within the system 100 and can extend beyond the system 100 asdepicted. Examples include a wide area network (WAN), a local areanetwork (LAN), a global network (e.g., Internet), a virtual privatenetwork (VPN), and an intranet. Communication within the network 110 maybe implemented using a wired network, an optical network, a wirelessnetwork and/or any kind of physical network implementation known in theart. The network 110 can be further subdivided into multiplesub-networks that may provide different levels of accessibility orprevent access to some elements of the system 100. For example, some ofthe user systems 112 may not have access to the data modeling servers102 and/or the data warehouse system 106.

FIG. 2 depicts a block diagram of a system 200 according to anembodiment. The system 200 is depicted embodied in a computer 201 inFIG. 2 . The system 200 is an example of one of the data modelingservers 102, data source servers 104, data warehouse system 106, and/oruser systems 112 of FIG. 1 .

In an exemplary embodiment, in terms of hardware architecture, as shownin FIG. 2 , the computer 201 includes a processing device 205 of aprocessing system and a memory device 210 of a memory system coupled toa memory controller 215 and an input/output controller 235. Theinput/output controller 235 may comprise, for example, one or more busesor other wired or wireless connections, as is known in the art. Theinput/output controller 235 may have additional elements, which areomitted for simplicity, such as controllers, buffers (caches), drivers,repeaters, and receivers, to enable communications. Further, thecomputer 201 may include address, control, and/or data connections toenable appropriate communications among the aforementioned components.

In an exemplary embodiment, a keyboard 250 and mouse 255 or similardevices can be coupled to the input/output controller 235.Alternatively, input may be received via a touch-sensitive or motionsensitive interface (not depicted). The computer 201 can further includea display controller 225 coupled to a display 230.

The processing device 205 comprises a hardware device for executingsoftware, particularly software stored in secondary storage 220 ormemory device 210. The processing device 205 may comprise any custommade or commercially available computer processor, a central processingunit (CPU), an auxiliary processor among several processors associatedwith the computer 201, a semiconductor-based microprocessor (in the formof a microchip or chip set), a macro-processor, or generally any devicefor executing instructions.

The memory device 210 can include any one or combination of volatilememory elements (e.g., random access memory (RAM, such as DRAM, SRAM,SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, erasableprogrammable read only memory (EPROM), electronically erasableprogrammable read only memory (EEPROM), flash memory, programmable readonly memory (PROM), or the like, etc.). Secondary storage 220 caninclude any one or combination of tape, compact disk read only memory(CD-ROM), flash drive, disk, hard disk drive, diskette, cartridge,cassette or the like, etc. Moreover, the memory device 210 and/orsecondary storage 220 may incorporate electronic, magnetic, optical,and/or other types of storage media. Accordingly, the memory device 210and/or secondary storage 220 are examples of a tangible computerreadable storage medium 240 upon which instructions executable by theprocessing device 205 may be embodied as a computer program product. Thememory device 210 and/or secondary storage 220 can have a distributedarchitecture, where various components are situated remotely from oneanother, but can be accessed by one or more instances of the processingdevice 205.

The instructions in memory device 210 may include one or more separateprograms, each of which comprises an ordered listing of executableinstructions for implementing logical functions. In the example of FIG.2 , the instructions in the memory device 210 include a suitableoperating system (O/S) 211 and program instructions 216. The operatingsystem 211 essentially controls the execution of other computer programsand provides scheduling, input-output control, file and data management,memory management, and communication control and related services. Whenthe computer 201 is in operation, the processing device 205 isconfigured to execute instructions stored within the memory device 210,to communicate data to and from the memory device 210, and to generallycontrol operations of the computer 201 pursuant to the instructions.Examples of program instructions 216 can include instructions toimplement the processes as further described herein.

The computer 201 of FIG. 2 also includes a network interface 260 thatcan establish communication channels with one or more other computersystems via one or more network links of the network 110 of FIG. 1 . Thenetwork interface 260 can support wired and/or wireless communicationprotocols known in the art.

FIG. 3 depicts a user interface 300 for a data modeling training toolaccording to an embodiment. For example, one of the user systems 112 ofFIG. 1 can enable access and display of the user interface 300 that mayrun locally on one of the user systems 112 or remotely on anothersystem, such as one of the data modeling servers 102 of FIG. 1 . Theuser interface 300 can include a selectable input 302 to invoke thetraining tool 132 of FIG. 1 for training the data model 130. The userinterface 300 can also include a selectable input 304 to display atraining summary associated with results of training. For example, theselectable input 304 may trigger the training tool 132 to displayaspects of the training data 136 and/or scoring summary 138 of FIG. 1 .The user interface 300 can also include a selectable option 306 toexecute the data model 130 of FIG. 1 after training and adjustments tothe data model 130 have been performed. For example, the selectableinput 306 may trigger the data modeling tool 134 of FIG. 1 to use thedata model 130 to produce one or more updated definitions for themetadata 125 or tables 124 of the enterprise data warehouse 120.

It will be understood that the example of FIG. 3 is for purposes ofexplanation and many variations to the user interface 300 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 300. Further, the selectable inputs 302-306 can befurther subdivided or combined and may be selectable using any format,such as buttons, pull-down selections, scrolling inputs, menus, andother such selectable formats.

FIG. 4 depicts an example of a user interface 400 for the training tool132 and the data modeling tool 134 of FIG. 1 in a first state accordingto an embodiment. The user interface 400 can be displayed, for example,in response to a detected selection of the selectable input 302 of FIG.3 . The user interface 400 can include multiple selectable options 402,404. Selectable option 402 may trigger uploading of the enterpriseabbreviation list 128 of FIG. 1 , for instance, to the data modelingservers 102 for use by the training tool 132. Selectable option 404 maytrigger uploading of a warehouse elements list that may be part of themetadata 125 of the enterprise data warehouse 120 of FIG. 1 for use bythe training tool 132. The warehouse elements list can define names,abbreviations, data types, and other such information for elementsalready in use within the enterprise data warehouse 120, while theenterprise abbreviation list 128 can include existing approvedabbreviations regardless of whether the abbreviations are currently inuse. Status information 406 can be displayed on the user interface 400based on selection of the selectable options 402, 404. For example, inresponse to selection of the selectable option 402 to upload theenterprise abbreviation list 128, the status information 406 canindicate progress of analysis being performed by the training tool 132.Some selectable options, such as a run the model option 408 may bedisabled from selection until uploading and analysis steps aresuccessfully completed.

It will be understood that the example of FIG. 4 is for purposes ofexplanation and many variations to the user interface 400 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 400. Further, the selectable options 402, 404, 408 canbe further subdivided or combined and may be selectable using anyformat, such as buttons, pull-down selections, scrolling inputs, menus,and other such selectable formats.

FIG. 5 depicts an example of a user interface 500 for the training tool132 and the data modeling tool 134 of FIG. 1 in a second state accordingto an embodiment. Once the upload of the enterprise abbreviation list128 of FIG. 1 is completed based on selection of the selectable option402 and the upload of a warehouse elements list is completed based onselection of the selectable option 404, a scoring program can beexecuted, for instance, as part of the training tool 132. The statusinformation 406 can indicate further status updates, such as completionof the scoring program. Completion of the scoring program may alsoresult in enabling the run the model option 408 and may result inadditional selectable options being displayed and/or enabled, such as areview option 502. The scoring program can be part of the training tool132 or a separate program (not depicted). The scoring program cangenerate the scoring summary 138 of FIG. 1 and is further describedherein.

It will be understood that the example of FIG. 5 is for purposes ofexplanation and many variations to the user interface 500 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 500. Further, the selectable options 402, 404, 408,502 can be further subdivided or combined and may be selectable usingany format, such as buttons, pull-down selections, scrolling inputs,menus, and other such selectable formats.

FIG. 6 depicts a scoring summary report 600 associated with the scoringsummary 138 of FIG. 1 according to an embodiment. The scoring summaryreport 600 can include a completion status 602, a review selector 604that may identify phrases having multiple possible abbreviations.Summary data can be presented in a summary table 606 and/or a summarychart 608. The contents of the summary chart 608 can be selected by achart type selector 610. In the example of FIG. 6 , a phrase of“PROPERTY” is selected for review through the review selector 604 afterrunning the scoring program. The summary table 606 illustrates examplephrases identified in the analysis that incorporate “PROPERTY”, such as“PROPERTY”, “PERSONAL PROPERTY”, “PROPERTY DAMAGE”, “PROPERTY LIMIT”,and “PROPERTY CASUALTY”, where each phrase can have an associated word,number of occurrences of the word, associated word abbreviation, numberof occurrences of the associated word abbreviation, scoring percentage,and associated phrase abbreviation. In the example of FIG. 6 , the charttype selector 610 is set to a phrase abbreviation chart that graphicallydepicts scoring percentages.

The summary table 606 and summary chart 608 can be generated accordingto a scoring process 700 as depicted in the example of FIG. 7 . In thescoring process 700, an enterprise attribute 702 and enterprisewarehouse records 704 can be extracted from the enterprise datawarehouse 120 of FIG. 1 by a scoring program 706 to determine a nameattribute score 708. An enterprise attribute 702 can be a word or phraseindicative of a field or column name. Each word or phrase that makes upall or a portion of an enterprise attribute 702 may be referred to as anenterprise term. The scoring program 706 can be part of the trainingtool 132 of FIG. 1 or part of a separate executable program. The scoringprogram 706 can count the number of times that an enterprise term andits corresponding abbreviation is used in the enterprise data warehouse120 and may assign a percentage score as the name attribute score 708.

In the example of FIG. 6 , the word “PROPERTY” occurs 31 times as alogical name in the enterprise data warehouse 120, but a correspondingabbreviation of “PROP” is used only 13 times as a physical column name.The logical name may represent a naming preference specific to anorganization that generates or uses the data. The word “PROPERTY” isalso observed in combination with other words, such as “DAMAGE” or“PERSONAL”, and a shorter abbreviation is used. For instance, the word“DAMAGE” may be observed with an abbreviation of “DMG” and “PERSONAL”abbreviated as “PRSNL” when the words are part of a logical name thathas “PROPERTY” in the abbreviation with further shortening. Thus, thephrase “PERSONAL PROPERTY” may be observed with an abbreviation of “PP”instead of “PRSNL_PROP”, and “PROPERTY DAMAGE” may be observed with anabbreviation of “PD” instead of “PROP_DMG”. The corresponding score canhelp to determine the weight that the words hold in the enterprise datawarehouse 120. The name attribute score can be further used inabbreviation logic when a physical abbreviated column name exceeds alength limit.

It will be understood that the example of FIG. 6 is for purposes ofexplanation and many variations to the scoring summary report 600 arecontemplated. For example, there can be additional selectable options,status information, instructions, tables, charts, and/or other contentdisplayed as part of the scoring summary report 600.

FIG. 8 depicts a block diagram of a training summary 800 according to anembodiment. The training summary 800 can include a word-phrase selector802, an organization selector 804, a summary table 806, and anassociated chart 808. For a selected word or phrase having scoringperformed by the scoring process 700 of FIG. 7 , the word-phraseselector 802 can select a word or phrase that has been analyzed by thescoring process 700. The organization selector 804 can further filterthe results according to how the word or phrase is scored by anorganization or business unit. For example, the phrase “PROPERTY” may beprimarily abbreviated as “PROP” based on examining data from theenterprise data warehouse 120 for two organizations but abbreviated as“PRPT” by a third organization, where all three organizations share theenterprise data warehouse 120. The training summary 800 may be reached,for example, based on a selection of the selectable input 304 of theuser interface 300 of FIG. 3 . The training summary 800 can enable auser to compare scoring summaries across various databases 122 of theenterprise data warehouse 120.

It will be understood that the example of FIG. 8 is for purposes ofexplanation and many variations to the training summary 800 arecontemplated. For example, there can be additional selectable options,status information, instructions, tables, charts, and/or other contentdisplayed as part of the training summary 800.

FIG. 9 depicts a process flow 900 of updating a data model, such as datamodel 130 of FIG. 1 , according to some embodiments. The process flow900 may be executed, for example, based on selection of the selectableinput 306 of the user interface 300 of FIG. 3 or selection of the runthe model option 408 of FIG. 5 . The process flow 900 can be executed aspart of the data modeling tool 134 of FIG. 1 or other such tool. In theexample of FIG. 9 , the process flow 900 starts at block 902 andadvances to block 904. At block 904, a source type can be read, wherethe source type indicates whether a mapping document, such as themapping document 126 should be parsed to search for changes or if thedatabases 116, 122 of FIG. 1 should be analyzed to identify changes. Theselection between source types can be performed through user interfaces1000 and 1100 of FIGS. 10 and 11 for mapping-document based modelgeneration and user interface 1400 of FIG. 14 for database-based modelgeneration. At block 906 of process flow 900, if the source type isselected as the mapping document 126, then the process flow 900 proceedsto block 908.

At block 908, the mapping document 126 can be parsed to identify anychanges. Updates may be marked within the mapping document 126,identified through a change history of the mapping document 126, ordetermined based on a comparison with a previous version of the mappingdocument 126, for example. At block 910, a logical name (also referredto as a business name or an organization identifier) and data type of anattribute can be retrieved based on one or more changes identified inthe mapping document 126. For example, data types may be integers,decimals, character strings, date/time, and other such types, includingfield size constraints. At block 912, a recursive lookup operation canbe performed for attributes by grouping words. Attributes can refer to apreferred description or field name associated with an organization.Attributes can be individual words or multiple words grouped in phrases,such as “PROPERTY”, “PERSONAL PROPERTY”, “PROPERTY DAMAGE”, “PROPERTYLIMIT”, “PROPERTY CASUALTY”, and so forth. The lookup may accessenterprise abbreviations 920 from the enterprise abbreviation list 128of FIG. 1 . At block 914, if an abbreviation is not found for any singleword, then at block 916 a new abbreviation is created. The newabbreviation may be displayed on a user interface or report for datamodel reviewer confirmation at block 918. An abbreviation associatedwith a new attribute can be added to enterprise abbreviations 920 of theenterprise abbreviation list 128, and the process flow 900 can continueat block 912. If an abbreviation is not found at block 914 but groupingsare still available to recursively search, then the process flow 900 cancontinue with recursive searching at block 912. If an abbreviation isfound at block 914, then an abbreviated column name can be created atblock 922.

At block 906, if a source type is selected as a database, then at block924, a schema comparison of a source database and a target database canbe performed to identify changes as further described herein. At block926, an organization identifier and data type can be retrieved based onone or more changes identified in the comparison of block 924. It may bepresumed that a new abbreviated column name identified from the databasecomparison is an adequate initial abbreviation until further refinementsare considered in subsequent blocks.

At block 928, after blocks 922 or 926 result in an abbreviation for acolumn name, a length check can be performed to determine whether thecolumn name is too long. For example, acceptable column names at anorganization level or possible column names may violate size constraintsof the enterprise data warehouse 120 of FIG. 1 . At block 930, furtherabbreviation options can be considered based on a scoring sheet 932 ofassociated words from the scoring summary 138 of FIG. 1 . Word or phraseoptions from the scoring sheet 932 can be analyzed with respect tohighest scoring results that satisfy column naming length constraints.At block 934, after performing block 930 or determining at block 928that the column name length was acceptable, a data definition language(DDL) output can be generated for a targeted database, such as one ofthe databases 122 of the enterprise data warehouse 120. Generation ofDDL output can be in the form of a DDL file or other object based ondata type mapping 936 associated with the type of databases 122 beingupdated. Output can be presented through a user interface to a datamodel reviewer 938 for selection, confirmation, or further adjustment.At block 940, a final model is ready and verified as the data model 130of FIG. 1 , where the data model 130 can be used to update metadata 125and/or tables 124 of the enterprise data warehouse 120. The process flow900 terminates at block 942.

The process flow 900 of FIG. 9 is not intended to indicate that theoperations of the process flow 900 are to be executed in any particularorder, or that all of the operations of the process flow 900 are to beincluded in every case. Additionally, the process flow 900 can includeany suitable number of additional operations.

FIG. 10 depicts user interface 1000 for generation of the data model 130of FIG. 1 in a first state according to an embodiment. User interface1000 indicates a simplified example for purposes of explanation, where aselectable option 1002 is available for data model generation based onthe mapping document 126 of FIG. 1 (also referred to as a required dataelement map (RDEM)). The selectable option 1002 can alternatively allowfor model generation from a source database, such as one of thedatabases 116 of FIG. 1 . Based on selection of the mapping-documentbased data model generation at selectable option 1002, a selectableinput 1004 can be displayed which initiates uploading of the mappingdocument 126 upon selection.

It will be understood that the example of FIG. 10 is for purposes ofexplanation and many variations to the user interface 1000 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 1000. Further, the selectable options and inputs 1002,1004 can be further subdivided or combined and may be selectable usingany format, such as buttons, pull-down selections, scrolling inputs,menus, and other such selectable formats.

FIG. 11 depicts a user interface 1100 for generation of the data model130 of FIG. 1 in a second state according to an embodiment. For example,the user interface 1100 may illustrate how the user interface 1000 ofFIG. 10 appears after the selectable option 1002 for data modelgeneration is selected and a source file 1104 of the mapping document126 of FIG. 1 is identified and uploaded. The user interface 1100 canalso include a confirmation request 1106, an output format selector1108, a progress indicator 1110, and a review request selector 1112. Theconfirmation request 1106 can provide a summary of the identifiedchanges in the mapping document 126 along with release information toconfirm that the uploaded version of the mapping document 126 isdesired. The output format selector 1108 can allow for selection ofvarious database formats supported by the databases 122 of FIG. 1 andtrigger creation of a data definition language file for the data model130 that corresponds with the selected database format and identifiedchanges in the mapping document 126. The review request selector 1112can display a data definition language file for the data model 130 forreview before committing the changes to the enterprise data warehouse120 of FIG. 1 .

It will be understood that the example of FIG. 11 is for purposes ofexplanation and many variations to the user interface 1100 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 1100. Further, the selectable options, inputs, andstatus can be further subdivided or combined and may be selectable usingany format, such as buttons, pull-down selections, scrolling inputs,menus, and other such selectable formats.

FIG. 12 depicts a user interface 1200 for reviewing data model outputaccording to an embodiment. The example of FIG. 12 can be the result ofselecting the review request selector 1112 of FIG. 11 . The userinterface 1200 may include a summary table 1202 of logical namesidentified as new or changed, associated data types, and physical namesto be used for corresponding column names in the databases 122 of theenterprise data warehouse 120 of FIG. 1 . As can be seen in the exampleof summary table 1202, logical names used by organizations can be longand descriptive, such as “PROPERTY DAMAGE OCCURRENCE PROTECTION ANDINDEMNITY DEDUCTIBLE AMOUNT” which can be reduced down to“PROP_DMG_OCCR_PI_DED_AMT” as a physical name for a column in theenterprise data warehouse 120 based on machine learning optimizations.The user interface 1200 can also include review questions 1204 that canbe used to distinguish between options and various preferences. Asubmission selector 1206 can indicate that selections in the reviewquestions 1204 are ready for submission. The results of responses toreview questions 1204 can be used to train the data model 130 of FIG. 1. The review questions 1204 may provide specific abbreviation optionsfor one or more words. The review questions 1204 may also provide achoice as to the scope of a decision, such as whether the selectionshould only apply for a particular selection or be applied as a changefor future occurrences of the words, for instance, as an update to theenterprise abbreviation list 128.

In embodiments, the mapping document 126 of FIG. 1 can define a sourceto target mapping, which may include logical names, description, anddata types for both sources and targets. This information can be used toidentify changes and key input details. The mapping document 126 can beparsed with logical names and data types of the columns gettingchanged/added being picked up along with target table names. In caseswhere a logical name is not available, the data modeling tool 134 ofFIG. 1 can use natural language processing to retrieve key words from acorresponding description to create a logical name. For example, if thedescription is, “This field is used to store total premium amountcalculated for the period the policy was active”, the data modeling tool134 can create a logical name using key words as “TOTAL PREMIUM AMOUNTPERIOD POLICY ACTIVE”.

Once the logical names are identified, the data modeling tool 134 canuse the enterprise abbreviation list 128 of FIG. 1 to look forcorresponding abbreviations. Resulting words can be grouped together toperform a lookup operation. An example is depicted in table 1.

TABLE 1 Abbreviation Example Enterprise Acronym Words to AbbreviateAbbreviation Indicator System1 System2 System3 ACCOUNT ACCT ACNTACCOUNTING FIELD AUDIT AFBS X BILLING SYSTEM ACT ACT ACTED ACT ACTIONACT ACTIVE ACTV ACT ACTIVITY ACTV ACTY WRITTEN PREMIUM WPRM

In reference to table 1, a logical name of “ACCOUNT WRITTEN PREMIUMAMOUNT” can be looked up as various permutations, such as: “ACCOUNTWRITTEN PREMIUM AMOUNT”, “ACCOUNT WRITTEN PREMIUM”, “ACCOUNT WRITTEN”,“WRITTEN PREMIUM AMOUNT”, “WRITTEN PREMIUM”, and “PREMIUM AMOUNT”. Theresult of the lookup in this example is that “PREMIUM AMOUNT” is locatedwith an enterprise abbreviation of “WPRM”.

Individual words can be looked up to get corresponding abbreviations. Ifan abbreviation is not found, then a new abbreviation can be createdusing one or more of the following options: a.) take first fewcharacters; b.) remove vowels; c.) remove vowels and then repeatingwords; d.) take first few characters of option c output. These optionscan be defined and updated as part of the rules 137 of FIG. 1 . Thus, alogical attribute having the word “CUSTODIAN” can be abbreviated as“CUS”, “CUST”, “CSTDN”, “CST”, “CSTD”. A preferred version of theabbreviation can be selected, or an alternate abbreviation can becreated if the results of the options are not preferred. The datamodeling tool 134 can observe the selected preferences and whichformatting rules resulted in the selected preferences and score thepreferences over time to improve abbreviation recommendations as part ofthe machine learning process. Once abbreviations have been determined,if a new abbreviated column name exceeds a maximum character length, thedata modeling tool 134 can use scoring from the scoring summary 138 tofurther shorten the abbreviation based on training results. For example,“PROPERTY DAMAGE PROTECTION AND INDEMNITY OCCURRENCE LEGAL LIABILITYDEDUCTIBLE AMOUNT” may be initially abbreviated as“PROP_DMG_PRTC_AND_INDM_OCCR_LGL_LIAB_DED_AMT”, which may still be toolong. The data modeling tool 134 can use the score of each word to checkhow many times full abbreviations of each word appears in an underlyingdatabase, such as one of the databases 122 of FIG. 1 . Neighboring lowerscore words can be combined to create an acronym. If a lower scoringword, such as “PROPERTY” is between two higher scoring words or does nothave a lower score neighbor, then the corresponding abbreviation mayremain unchanged. An example of this scoring is provided in table 2.

TABLE 2 Scoring Example Name Abbrv Score PROPERTY PROP 0.28 DAMAGE DMG0.6 PROTECTION PRTC 0.3 AND AND 0.25 INDEMNITY INDM 0.18 OCCURRENCE OCCR0.65 LEGAL LGL 0.39 LIABILITY LIAB 0.45 DEDUCTIBLE DED 0.8 AMOUNT AMT0.9

Thus, a shortened abbreviation may become,“PROP_DMG_PI_OCCR_LL_DED_AMT”, where “PI” and “LL” are furtherabbreviated as combinations of words. Score comparisons can be relativerather than using predetermined thresholds as part of the fuzzyreasoning logic which may be defined in the rules 137 of FIG. 1 .Patterns and weights of relationships between words and phrases can berefined over time as more data are processed to further refine morecomplex combinations of words into abbreviations. Once the abbreviationis selected, data types can be applied based on underlying databasestandards. For instance, a number data type defined in the mappingdocument 126 may become a smallint data type or a decimal data typebased on size and defined standards supported by a target database ofthe databases 122.

As further selections are made through the review questions 1204,further supporting data can be presented to provide additional detail toassist in making selections before a final submission is performed. Anexample of such additional detail is provided in FIG. 13 .

It will be understood that the example of FIG. 12 is for purposes ofexplanation and many variations to the user interface 1200 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 1200. Further, the selectable options, inputs, andstatus can be further subdivided or combined and may be selectable usingany format, such as buttons, pull-down selections, scrolling inputs,menus, and other such selectable formats.

FIG. 13 depicts a user interface 1300 of an updated abbreviation list1302 according to an embodiment. The updated abbreviation list 1302 caninclude new words and derivatives along with abbreviations as updated inthe enterprise abbreviation list 128 of FIG. 1 as a result of executingthe data modeling tool 134 of FIG. 1 . Derivatives can include words orphrases that have a similar root structure or spelling. For instance,“CUSTODIAN”, “CUSTODIAL”, and “CUSTODIANSHIP” may all be abbreviated as“CUSTD”. Similarly, “CATASTROPHE”, “CATASTROPHIC”, and“CATASTROPHICALLY” may all be considered derivatives abbreviated as“CAT”. The user interface 1300 can also include a selectable option 1304to download the data model 130 of FIG. 1 that has been updated.

It will be understood that the example of FIG. 13 is for purposes ofexplanation and many variations to the user interface 1300 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 1300. Further, the selectable options and inputs canbe further subdivided or combined and may be selectable using anyformat, such as buttons, pull-down selections, scrolling inputs, menus,and other such selectable formats.

FIG. 14 depicts a user interface 1400 for generation of the data model130 of FIG. 1 in a third state according to an embodiment. Userinterface 1400 indicates a selection 1402 of generating the data model130 based on a source database, such as one of the databases 116 of FIG.1 . Based on selecting a source database as the selection 1402, adatabase selection interface can be opened as depicted in the example ofFIG. 15 .

It will be understood that the example of FIG. 14 is for purposes ofexplanation and many variations to the user interface 1400 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 1400. Further, the selectable options and inputs canbe further subdivided or combined and may be selectable using anyformat, such as buttons, pull-down selections, scrolling inputs, menus,and other such selectable formats.

FIG. 15 depicts a user interface 1500 for creating a data definitionlanguage output based on a selected source according to an embodiment.Selection options for a source database can include identification of asource database technology type 1502 and a source schema 1504 for asource database to be analyzed, such as one of the databases 116 of FIG.1 . The user interface 1500 can also allow for selection of a targetdatabase technology type 1506 and a target schema 1508 for a targetdatabase, such as one of the databases 122 of FIG. 1 . A generationselector 1510 can trigger creation of an associated data definitionlanguage file that maps column names and data types of the sourcedatabase through the data model 130 into a format compatible with thetarget database of the enterprise data warehouse 120 of FIG. 1 .

It will be understood that the example of FIG. 15 is for purposes ofexplanation and many variations to the user interface 1500 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 1500. Further, the selectable options and inputs canbe further subdivided or combined and may be selectable using anyformat, such as buttons, pull-down selections, scrolling inputs, menus,and other such selectable formats.

FIG. 16 depicts a user interface 1600 with a data definition languageoutput based on a selected source according to an embodiment. Userinterface 1600 expands upon the user interface 1500 of FIG. 15 todisplay a preview of a data definition language output 1602 associatedwith a source database having a source database technology type 1502 anda source schema 1504. The data definition language output 1602 can alsodisplay a corresponding definition for the target database based on atarget database technology type 1506 and a target schema 1508 afterselection of the generation selector 1510. The preview of datadefinition language output 1602 can enable a user to confirm whether themapping of column names and data types appear as intended. The datadefinition language output 1602 can be stored in a file or other objectin a command format that aligns with the underlying technologies andschemas to update or interpret data within databases 116, 122 of FIG. 1.

FIG. 16 also illustrates how data types can be automatically remappedbetween the source schema 1504 and the target schema 1508 based on rulesassociated with the source database technology type 1502 and the targetdatabase technology type 1506. For example, a data type of “timestamp”used by the source database technology type 1502 can be converted into adata type of “datetime” to align with the target database technologytype 1506. Similarly, a data type of “long varchar” can be converted toa data type of “text”. Further, a data type of character large object(“CLOB”) can be converted to a data type of “varchar(max)”. Other suchdata type conversions can be supported through data conversion rules.

It will be understood that the example of FIG. 16 is for purposes ofexplanation and many variations to the user interface 1600 arecontemplated. For example, there can be additional selectable options,status information, instructions, and/or other content displayed as partof user interface 1600. Further, the selectable options and inputs canbe further subdivided or combined and may be selectable using anyformat, such as buttons, pull-down selections, scrolling inputs, menus,and other such selectable formats.

Turning now to FIG. 17 , a process flow 1700 is depicted according to anembodiment. The process flow 1700 includes a number of steps that may beperformed in the depicted sequence or in an alternate sequence. Theprocess flow 1700 may be performed by the system 100 of FIG. 1 , forinstance, using one or more processing systems or processing device 205of FIG. 2 . The process flow 1700 is described in reference to FIGS.1-17 .

At block 1702, the enterprise data warehouse 120 can be analyzed todetermine a plurality of name attribute scores 708 based on a number ofoccurrences of the enterprise terms (e.g., enterprise attributes 702)and the abbreviations in the enterprise data warehouse 120. Theenterprise data warehouse 120 can include a plurality of databases 122and metadata 125 defining one or more aspects of data within thedatabases 122. For example, metadata 125 can include database schemainformation that defines how tables 124 are structured and/or related.Databases 116 and databases 122 can have at least one difference infield formats, where the field formats may represent columns of data intables 124.

At block 1704, a scoring summary 138 of a plurality of phrases includingat least one shared word can be generated based on the enterprise termsidentified in the enterprise data warehouse 120 and the name attributescores 708. At block 1706, fuzzy reasoning logic can be applied toidentify one or more relationship patterns and weights for the phrasesincluding at least one shared word based on the scoring summary 138 toproduce a plurality of training data 136 for a data model 130 associatedwith the enterprise data warehouse 120. At block 1708, the data model130 can be updated with a new abbreviated field name associated with anew field name based on identifying a closest match of the new fieldname with the training data 136.

In embodiments, the data modeling servers 102 can receive a request toupdate the data model 130 to include the new field name associated witha new or updated column of data for the tables 124. The enterprise datawarehouse 120 can be searched to confirm whether an existing field namematches the new field name. Embodiments can also determine whether afield attribute change is requested based on matching the new field namewith the existing field name. A data definition language file associatedwith the data model 130 can be updated based on identifying the fieldattribute change.

In embodiments, a data source identifier can be received by the datamodeling tool 134 as part of an update or creation of the data model130. A database schema comparison can be performed to identify one ormore differences between a source database of databases 116 and a targetdatabase of databases 122 based on determining that the data sourceidentifier indicates a database source type. A field name and a datatype associated with the one or more differences can be identified. Thefield name associated with the one or more differences can be checkedwith respect to a plurality of formatting rules as part of the rules137. An abbreviation to the field name can be applied based on thetraining data 136 and the formatting rules. A data definition languagefile can be generated for the target database based on the field name,the abbreviation of the field name, and the data type.

In embodiments, a data source identifier can be received at the datamodeling tool 134. Mapping document 126 can be parsed to identify one ormore differences with respect to an existing mapping document (e.g., anearlier version of mapping document 126) based on determining that thedata source identifier indicates a mapping document source type. A fieldname and a data type associated with the one or more differences can beidentified. The data modeling tool 134 can determine whether an existingabbreviation of the field name is found in the enterprise abbreviationlist 128 and create an abbreviation of the field name based onidentifying a matching entry in the enterprise abbreviation list 128.The abbreviation of the field name can be created based on the trainingdata 136 in response to a failure to identify the matching entry in theenterprise abbreviation list 128. A data definition language file can begenerated for a target database of the databases 122 based on the fieldname, the abbreviation of the field name, and the data type. A pluralityof word groupings can be identified in the field name. A plurality ofcombinations of the word groupings can be formed for abbreviating thefield name. The abbreviation of the field name can be determined basedon matching at least one of the combinations with the enterpriseabbreviation list 128. The scoring summary 138 can be determined for thecombinations of the word groupings. The abbreviation of the field namecan be selected based on the scoring summary 138 for the combinations ofthe word groupings. The field name associated with the one or moredifferences can be checked with respect to a plurality of formattingrules as part of the rules 137. The abbreviation to the field name canbe applied based on the training data 136 and the formatting rules.

In embodiments, the mapping document 126 can be generated including asource to target mapping of at least one field name, description, anddata type for an update of the data model 130 at a target database ofthe databases 122 of the enterprise data warehouse 120. A field name canbe derived based on the description in response to determining that thefield name is missing in the target database. A plurality of enterprisedomains can be identified associated with different instances of theenterprise abbreviation list 128. An abbreviation of the field name canbe determined based on comparing abbreviation data across the enterprisedomains.

In embodiments, a plurality of abbreviation options can be derived byapplying a plurality of abbreviation rules as part of rules 137 to aphrase based on a failure to locate a corresponding abbreviation in theenterprise abbreviation list 128. One of abbreviation options can beselected as a derived abbreviation based on confirming that the derivedabbreviation is unique with respect to the enterprise abbreviation list128. The derived abbreviation can be compared to a maximum characterlength limit. The derived abbreviation can be parsed into a plurality ofabbreviated words based on determining that the derived abbreviationexceeds the maximum character length limit. The enterprise datawarehouse 120 can be analyzed to determine a plurality of derivedabbreviation scores based on a number of occurrences of the abbreviatedwords and combinations of the abbreviated words in the enterprise datawarehouse 120. The derived abbreviation can be modified to drop one ormore characters based on the derived abbreviation scores. Theabbreviation options can be output to a user interface, such as userinterfaces 1200 and 1300. The derived abbreviation can be confirmedbased on a selection through the user interface. The enterpriseabbreviation list 128 can be updated with the derived abbreviation basedon a request received through the user interface.

In embodiments, a source database selection and a source schemaselection associated with the data model 130 can be selected through auser interface, such as user interfaces 1500 and 1600. A target databaseselection and a target schema selection associated with the data modelcan also be received through the user interface. A data definitionlanguage file can be generated based on one or more updates to the datamodel 130 to map a change between from the source database selection andthe source schema selection to the target database selection and thetarget schema selection.

Technical effects include automating data modeling and mapping fromvarious data sources to an enterprise data warehouse. Discovering andlearning abbreviations for words and phrases using machine learning canreduce the physical column name sizes used in the enterprise datawarehouse to improve storage efficiency and reduce update time.

It will be appreciated that aspects of the present invention may beembodied as a system, method, or computer program product and may takethe form of a hardware embodiment, a software embodiment (includingfirmware, resident software, micro-code, etc.), or a combinationthereof. Furthermore, aspects of the present invention may take the formof a computer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

One or more computer readable medium(s) may be utilized. The computerreadable medium may comprise a computer readable signal medium or acomputer readable storage medium. A computer readable storage medium maycomprise, for example, an electronic, magnetic, optical,electromagnetic, infrared, or semiconductor system, apparatus, ordevice, or any suitable combination of the foregoing. More specificexamples (a non-exhaustive list) of the computer readable storage mediuminclude the following: an electrical connection having one or morewires, a portable computer diskette, a hard disk, a random access memory(RAM), a read-only memory (ROM), an erasable programmable read-onlymemory (EPROM or Flash memory), an optical fiber, a portable compactdisk read-only memory (CD-ROM), an optical storage device, a magneticstorage device, or any suitable combination of the foregoing. In oneaspect, the computer readable storage medium may comprise a tangiblemedium containing or storing a program for use by or in connection withan instruction execution system, apparatus, and/or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may comprise any computer readablemedium that is not a computer readable storage medium and that cancommunicate, propagate, and/or transport a program for use by or inconnection with an instruction execution system, apparatus, and/ordevice.

The computer readable medium may contain program code embodied thereon,which may be transmitted using any appropriate medium, including, butnot limited to wireless, wireline, optical fiber cable, RF, etc., or anysuitable combination of the foregoing. In addition, computer programcode for carrying out operations for implementing aspects of the presentinvention may be written in any combination of one or more programminglanguages, including an object oriented programming language such asJava, Smalltalk, C++ or the like and conventional procedural programminglanguages, such as the “C” programming language or similar programminglanguages. The program code may execute entirely on the user's computer,partly on the user's computer, as a stand-alone software package, partlyon the user's computer and partly on a remote computer, or entirely onthe remote computer or server.

It will be appreciated that aspects of the present invention aredescribed herein with reference to flowchart illustrations and/or blockdiagrams of methods, apparatus (systems) and computer program products,according to embodiments of the invention. It will be understood thateach block or step of the flowchart illustrations and/or block diagrams,and combinations of blocks or steps in the flowchart illustrationsand/or block diagrams, can be implemented by computer programinstructions. These computer program instructions may be provided to aprocessor of a computer, or other programmable data processing apparatusto produce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks. The computer program instructions may also beloaded on to a computer, other programmable data processing apparatus,or other devices to cause a series of operational steps to be performedon the computer, other programmable apparatus or other devices toproduce a computer implemented process such that the instructions whichexecute on the computer or other programmable apparatus provideprocesses for implementing the functions/acts specified in the flowchartand/or block diagram block or blocks.

In addition, some embodiments described herein are associated with an“indication”. As used herein, the term “indication” may be used to referto any indicia and/or other information indicative of or associated witha subject, item, entity, and/or other object and/or idea. As usedherein, the phrases “information indicative of” and “indicia” may beused to refer to any information that represents, describes, and/or isotherwise associated with a related entity, subject, or object. Indiciaof information may include, for example, a code, a reference, a link, asignal, an identifier, and/or any combination thereof and/or any otherinformative representation associated with the information. In someembodiments, indicia of information (or indicative of the information)may be or include the information itself and/or any portion or componentof the information. In some embodiments, an indication may include arequest, a solicitation, a broadcast, and/or any other form ofinformation gathering and/or dissemination.

Numerous embodiments are described in this patent application, and arepresented for illustrative purposes only. The described embodiments arenot, and are not intended to be, limiting in any sense. The presentlydisclosed invention(s) are widely applicable to numerous embodiments, asis readily apparent from the disclosure. One of ordinary skill in theart will recognize that the disclosed invention(s) may be practiced withvarious modifications and alterations, such as structural, logical,software, and electrical modifications. Although particular features ofthe disclosed invention(s) may be described with reference to one ormore particular embodiments and/or drawings, it should be understoodthat such features are not limited to usage in the one or moreparticular embodiments or drawings with reference to which they aredescribed, unless expressly specified otherwise.

Devices that are in communication with each other need not be incontinuous communication with each other, unless expressly specifiedotherwise. On the contrary, such devices need only transmit to eachother as necessary or desirable, and may actually refrain fromexchanging data most of the time. For example, a machine incommunication with another machine via the Internet may not transmitdata to the other machine for weeks at a time. In addition, devices thatare in communication with each other may communicate directly orindirectly through one or more intermediaries.

A description of an embodiment with several components or features doesnot imply that all or even any of such components and/or features arerequired. On the contrary, a variety of optional components aredescribed to illustrate the wide variety of possible embodiments of thepresent invention(s). Unless otherwise specified explicitly, nocomponent and/or feature is essential or required.

Further, although process steps, algorithms or the like may be describedin a sequential order, such processes may be configured to work indifferent orders. In other words, any sequence or order of steps thatmay be explicitly described does not necessarily indicate a requirementthat the steps be performed in that order. The steps of processesdescribed herein may be performed in any order practical. Further, somesteps may be performed simultaneously despite being described or impliedas occurring non-simultaneously (e.g., because one step is describedafter the other step). Moreover, the illustration of a process by itsdepiction in a drawing does not imply that the illustrated process isexclusive of other variations and modifications thereto, does not implythat the illustrated process or any of its steps are necessary to theinvention, and does not imply that the illustrated process is preferred.

“Determining” something can be performed in a variety of manners andtherefore the term “determining” (and like terms) includes calculating,computing, deriving, looking up (e.g., in a table, database or datastructure), ascertaining and the like.

It will be readily apparent that the various methods and algorithmsdescribed herein may be implemented by, e.g., appropriately and/orspecially-programmed computers and/or computing devices. Typically aprocessor (e.g., one or more microprocessors) will receive instructionsfrom a memory or like device, and execute those instructions, therebyperforming one or more processes defined by those instructions. Further,programs that implement such methods and algorithms may be stored andtransmitted using a variety of media (e.g., computer readable media) ina number of manners. In some embodiments, hard-wired circuitry or customhardware may be used in place of, or in combination with, softwareinstructions for implementation of the processes of various embodiments.Thus, embodiments are not limited to any specific combination ofhardware and software.

A “processor” generally means any one or more microprocessors, CPUdevices, computing devices, microcontrollers, digital signal processors,or like devices, as further described herein.

The term “computer-readable medium” refers to any medium thatparticipates in providing data (e.g., instructions or other information)that may be read by a computer, a processor or a like device. Such amedium may take many forms, including but not limited to, non-volatilemedia, volatile media, and transmission media. Non-volatile mediainclude, for example, optical or magnetic disks and other persistentmemory. Volatile media include DRAM, which typically constitutes themain memory. Transmission media include coaxial cables, copper wire andfiber optics, including the wires that comprise a system bus coupled tothe processor. Transmission media may include or convey acoustic waves,light waves and electromagnetic emissions, such as those generatedduring RF and IR data communications. Common forms of computer-readablemedia include, for example, a floppy disk, a flexible disk, hard disk,magnetic tape, any other magnetic medium, a CD-ROM, DVD, any otheroptical medium, punch cards, paper tape, any other physical medium withpatterns of holes, a RAM, a PROM, an EPROM, a FLASH-EEPROM, any othermemory chip or cartridge, a carrier wave, or any other medium from whicha computer can read.

The term “computer-readable memory” may generally refer to a subsetand/or class of computer-readable medium that does not includetransmission media such as waveforms, carrier waves, electromagneticemissions, etc. Computer-readable memory may typically include physicalmedia upon which data (e.g., instructions or other information) arestored, such as optical or magnetic disks and other persistent memory,DRAM, a floppy disk, a flexible disk, hard disk, magnetic tape, anyother magnetic medium, a CD-ROM, DVD, any other optical medium, punchcards, paper tape, any other physical medium with patterns of holes, aRAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip orcartridge, computer hard drives, backup tapes, Universal Serial Bus(USB) memory devices, and the like.

Various forms of computer readable media may be involved in carryingdata, including sequences of instructions, to a processor. For example,sequences of instruction (i) may be delivered from RAM to a processor,(ii) may be carried over a wireless transmission medium, and/or (iii)may be formatted according to numerous formats, standards or protocols,such as Bluetooth™, TDMA, CDMA, 3G.

Where databases are described, it will be understood by one of ordinaryskill in the art that (i) alternative database structures to thosedescribed may be readily employed, and (ii) other memory structuresbesides databases may be readily employed. Any illustrations ordescriptions of any sample databases presented herein are illustrativearrangements for stored representations of information. Any number ofother arrangements may be employed besides those suggested by, e.g.,tables illustrated in drawings or elsewhere. Similarly, any illustratedentries of the databases represent exemplary information only; one ofordinary skill in the art will understand that the number and content ofthe entries can be different from those described herein. Further,despite any depiction of the databases as tables, other formats(including relational databases, object-based models and/or distributeddatabases) could be used to store and manipulate the data typesdescribed herein. Likewise, object methods or behaviors of a databasecan be used to implement various processes, such as the describedherein. In addition, the databases may, in a known manner, be storedlocally or remotely from a device that accesses data in such a database.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of onemore other features, integers, steps, operations, element components,and/or groups thereof.

What is claimed is:
 1. A method, comprising: analyzing an enterprisedata warehouse to determine a plurality of name attribute scores basedon a number of occurrences of a plurality of enterprise terms andabbreviations in the enterprise data warehouse using an enterpriseabbreviation list that maps the enterprise terms to the abbreviations;generating a scoring summary of a plurality of phrases comprising atleast one shared word based on the enterprise terms identified in theenterprise data warehouse and the name attribute scores; applying fuzzyreasoning logic to identify one or more relationship patterns andweights for the phrases comprising at least one shared word based on thescoring summary to produce a plurality of training data for a data modelassociated with the enterprise data warehouse; and updating the datamodel with a new abbreviated field name associated with a new field namebased on identifying a closest match of the new field name with thetraining data.
 2. The method of claim 1, wherein the enterprise datawarehouse comprises a plurality of databases and metadata defining oneor more aspects of data within the databases.
 3. The method of claim 2,wherein the databases comprise at least one difference in field formats.4. The method of claim 1, further comprising: receiving a request toupdate the data model to include the new field name; and searching theenterprise data warehouse to confirm whether an existing field namematches the new field name.
 5. The method of claim 4, furthercomprising: determining whether a field attribute change is requestedbased on matching the new field name with the existing field name; andupdating a data definition language file associated with the data modelbased on identifying the field attribute change.
 6. The method of claim1, further comprising: receiving a data source identifier; performing adatabase schema comparison to identify one or more differences between asource database and a target database based on determining that the datasource identifier indicates a database source type; identifying a fieldname and a data type associated with the one or more differences;checking the field name associated with the one or more differences withrespect to a plurality of formatting rules; applying an abbreviation tothe field name based on the training data and the formatting rules; andgenerating a data definition language file for the target database basedon the field name, the abbreviation of the field name, and the datatype.
 7. The method of claim 1, further comprising: receiving a datasource identifier; parsing a mapping document to identify one or moredifferences with respect to an existing mapping document based ondetermining that the data source identifier indicates a mapping documentsource type; identifying a field name and a data type associated withthe one or more differences; determining whether an existingabbreviation of the field name is found in the enterprise abbreviationlist; creating an abbreviation of the field name based on identifying amatching entry in the enterprise abbreviation list; creating theabbreviation of the field name based on the training data in response toa failure to identify the matching entry in the enterprise abbreviationlist; and generating a data definition language file for a targetdatabase based on the field name, the abbreviation of the field name,and the data type.
 8. The method of claim 7, further comprising:identifying a plurality of word groupings in the field name; forming aplurality of combinations of the word groupings for abbreviating thefield name; and determining the abbreviation of the field name based onmatching at least one of the combinations with the enterpriseabbreviation list.
 9. The method of claim 8, further comprising:determining the scoring summary for the combinations of the wordgroupings; and selecting the abbreviation of the field name based on thescoring summary for the combinations of the word groupings.
 10. Themethod of claim 7, further comprising: checking the field nameassociated with the one or more differences with respect to a pluralityof formatting rules; and applying the abbreviation to the field namebased on the training data and the formatting rules.
 11. The method ofclaim 1, further comprising: generating a mapping document comprising asource to target mapping of at least one field name, description, anddata type for an update of the data model at a target database of theenterprise data warehouse; and deriving a field name based on thedescription in response to determining that the field name is missing inthe target database.
 12. The method of claim 11, further comprising:identifying a plurality of enterprise domains associated with differentinstances of the enterprise abbreviation list; and determining anabbreviation of the field name based on comparing abbreviation dataacross the enterprise domains.
 13. The method of claim 1, furthercomprising: deriving a plurality of abbreviation options by applying aplurality of abbreviation rules to a phrase based on a failure to locatea corresponding abbreviation in the enterprise abbreviation list; andselecting one of the abbreviation options as a derived abbreviationbased on confirming that the derived abbreviation is unique with respectto the enterprise abbreviation list.
 14. The method of claim 13, furthercomprising: comparing the derived abbreviation to a maximum characterlength limit; parsing the derived abbreviation into a plurality ofabbreviated words based on determining that the derived abbreviationexceeds the maximum character length limit; analyzing the enterprisedata warehouse to determine a plurality of derived abbreviation scoresbased on a number of occurrences of the abbreviated words andcombinations of the abbreviated words in the enterprise data warehouse;and modifying the derived abbreviation to drop one or more charactersbased on the derived abbreviation scores.
 15. The method of claim 13,further comprising: outputting the abbreviation options to a userinterface; confirming the derived abbreviation based on a selectionthrough the user interface; and updating the enterprise abbreviationlist with the derived abbreviation based on a request received throughthe user interface.
 16. The method of claim 1, further comprising:receiving a source database selection and a source schema selectionassociated with the data model through a user interface; receiving atarget database selection and a target schema selection associated withthe data model through the user interface; and generating a datadefinition language file based on one or more updates to the data modelto map a change between from the source database selection and thesource schema selection to the target database selection and the targetschema selection.